cost map
Language as Cost: Proactive Hazard Mapping using VLM for Robot Navigation
Oh, Mintaek, Kim, Chan, Seo, Seung-Woo, Kim, Seong-Woo
-- Robots operating in human-centric or hazardous environments must proactively anticipate and mitigate dangers beyond basic obstacle detection. Traditional navigation systems often depend on static maps, which struggle to account for dynamic risks, such as a person emerging from a suddenly opening door . As a result, these systems tend to be reactive rather than anticipatory when handling dynamic hazards. Recent advancements in pre-trained large language models and vision-language models (VLMs) create new opportunities for proactive hazard avoidance. In this work, we propose a zero-shot language-as-cost mapping framework that leverages VLMs to interpret visual scenes, assess potential dynamic risks, and assign risk-aware navigation costs preemptively, enabling robots to anticipate hazards before they materialize. By integrating this language-based cost map with a geometric obstacle map, the robot not only identifies existing obstacles but also anticipates and proactively plans around potential hazards arising from environmental dynamics. Experiments in simulated and diverse dynamic environments demonstrate that the proposed method significantly improves navigation success rates and reduces hazard encounters, compared to reactive baseline planners. Code and supplementary materials are available at https://github.com/T Mobile robots are increasingly deployed in everyday environments, such as homes, hospitals, warehouses, and disaster sites, where safety and context-aware navigation are critical.
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > United States (0.04)
MRHaD: Mixed Reality-based Hand-Drawn Map Editing Interface for Mobile Robot Navigation
Taki, Takumi, Kobayashi, Masato, Iglesius, Eduardo, Chiba, Naoya, Shirai, Shizuka, Uranishi, Yuki
-- Mobile robot navigation systems are increasingly relied upon in dynamic and complex environments, yet they often struggle with map inaccuracies and the resulting inefficient path planning. This paper presents MRHaD, a Mixed Reality-based Hand-drawn Map Editing Interface that enables intuitive, real-time map modifications through natural hand gestures. By integrating the MR head-mounted display with the robotic navigation system, operators can directly create hand-drawn restricted zones (HRZ), thereby bridging the gap between 2D map representations and the real-world environment. Comparative experiments against conventional 2D editing methods demonstrate that MRHaD significantly improves editing efficiency, map accuracy, and overall usability, contributing to safer and more efficient mobile robot operations. The proposed approach provides a robust technical foundation for advancing human-robot collaboration and establishing innovative interaction models that enhance the hybrid future of robotics and human society. I. INTRODUCTION Recent advances in autonomous mobile robots have opened up new opportunities for human-robot collaboration in various application domains, including logistics, healthcare, and public spaces [1], [2], [3]. Typically, these robots use pre-constructed environmental maps and dynamically adjust their paths based on real-time environmental sensing with various onboard sensors. Path planning methods are generally divided into two categories: global planning and local planning [4].
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.05)
- North America > United States > Hawaii (0.04)
IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models
Ling, Yiyang, Owalekar, Karan, Adesanya, Oluwatobiloba, Bıyık, Erdem, Seita, Daniel
Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow) to more dangerous (e.g., toppling a glass vase). Due to this diversity, it is difficult to characterize which contacts may be acceptable or unacceptable. In this paper, we propose IMPACT, a novel motion planning framework that uses Vision-Language Models (VLMs) to infer environment semantics, identifying which parts of the environment can best tolerate contact based on object properties and locations. Our approach uses the VLM's outputs to produce a dense 3D "cost map" that encodes contact tolerances and seamlessly integrates with standard motion planners. We perform experiments using 20 simulation and 10 real-world scenes and assess using task success rate, object displacements, and feedback from human evaluators. Our results over 3620 simulation and 200 real-world trials suggest that IMPACT enables efficient contact-rich motion planning in cluttered settings while outperforming alternative methods and ablations. Supplementary material is available at https://impact-planning.github.io/.
Path Planning using Instruction-Guided Probabilistic Roadmaps
This work presents a novel data-driven path planning algorithm named Instruction-Guided Probabilistic Roadmap (IG-PRM). Despite the recent development and widespread use of mobile robot navigation, the safe and effective travels of mobile robots still require significant engineering effort to take into account the constraints of robots and their tasks. With IG-PRM, we aim to address this problem by allowing robot operators to specify such constraints through natural language instructions, such as ``aim for wider paths'' or ``mind small gaps''. The key idea is to convert such instructions into embedding vectors using large-language models (LLMs) and use the vectors as a condition to predict instruction-guided cost maps from occupancy maps. By constructing a roadmap based on the predicted costs, we can find instruction-guided paths via the standard shortest path search. Experimental results demonstrate the effectiveness of our approach on both synthetic and real-world indoor navigation environments.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations
Canh, Thanh Nguyen, Ngo, Huy-Hoang, HoangVan, Xiem, Chong, Nak Young
Localization is one of the most crucial tasks for Unmanned Aerial Vehicle systems (UAVs) directly impacting overall performance, which can be achieved with various sensors and applied to numerous tasks related to search and rescue operations, object tracking, construction, etc. However, due to the negative effects of challenging environments, UAVs may lose signals for localization. In this paper, we present an effective path-planning system leveraging semantic segmentation information to navigate around texture-less and problematic areas like lakes, oceans, and high-rise buildings using a monocular camera. We introduce a real-time semantic segmentation architecture and a novel keyframe decision pipeline to optimize image inputs based on pixel distribution, reducing processing time. A hierarchical planner based on the Dynamic Window Approach (DWA) algorithm, integrated with a cost map, is designed to facilitate efficient path planning. The system is implemented in a photo-realistic simulation environment using Unity, aligning with segmentation model parameters. Comprehensive qualitative and quantitative evaluations validate the effectiveness of our approach, showing significant improvements in the reliability and efficiency of UAV localization in challenging environments.
- Information Technology > Robotics & Automation (0.66)
- Aerospace & Defense (0.49)
Towards Safer Planetary Exploration: A Hybrid Architecture for Terrain Traversability Analysis in Mars Rovers
Chiuchiarelli, Achille, Franchini, Giacomo, Messina, Francesco, Chiaberge, Marcello
The field of autonomous navigation for unmanned ground vehicles (UGVs) is in continuous growth and increasing levels of autonomy have been reached in the last few years. However, the task becomes more challenging when the focus is on the exploration of planet surfaces such as Mars. In those situations, UGVs are forced to navigate through unstable and rugged terrains which, inevitably, open the vehicle to more hazards, accidents, and, in extreme cases, complete mission failure. The paper addresses the challenges of autonomous navigation for unmanned ground vehicles in planetary exploration, particularly on Mars, introducing a hybrid architecture for terrain traversability analysis that combines two approaches: appearance-based and geometry-based. The appearance-based method uses semantic segmentation via deep neural networks to classify different terrain types. This is further refined by pixel-level terrain roughness classification obtained from the same RGB image, assigning different costs based on the physical properties of the soil. The geometry-based method complements the appearance-based approach by evaluating the terrain's geometrical features, identifying hazards that may not be detectable by the appearance-based side. The outputs of both methods are combined into a comprehensive hybrid cost map. The proposed architecture was trained on synthetic datasets and developed as a ROS2 application to integrate into broader autonomous navigation systems for harsh environments. Simulations have been performed in Unity, showing the ability of the method to assess online traversability analysis.
- Europe > Italy > Piedmont > Turin Province > Turin (0.14)
- Europe > Italy > Lombardy > Milan (0.06)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (2 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Words to Wheels: Vision-Based Autonomous Driving Understanding Human Language Instructions Using Foundation Models
Ryu, Chanhoe, Seong, Hyunki, Lee, Daegyu, Moon, Seongwoo, Min, Sungjae, Shim, D. Hyunchul
This paper introduces an innovative application of foundation models, enabling Unmanned Ground Vehicles (UGVs) equipped with an RGB-D camera to navigate to designated destinations based on human language instructions. Unlike learning-based methods, this approach does not require prior training but instead leverages existing foundation models, thus facilitating generalization to novel environments. Upon receiving human language instructions, these are transformed into a 'cognitive route description' using a large language model (LLM)-a detailed navigation route expressed in human language. The vehicle then decomposes this description into landmarks and navigation maneuvers. The vehicle also determines elevation costs and identifies navigability levels of different regions through a terrain segmentation model, GANav, trained on open datasets. Semantic elevation costs, which take both elevation and navigability levels into account, are estimated and provided to the Model Predictive Path Integral (MPPI) planner, responsible for local path planning. Concurrently, the vehicle searches for target landmarks using foundation models, including YOLO-World and EfficientViT-SAM. Ultimately, the vehicle executes the navigation commands to reach the designated destination, the final landmark. Our experiments demonstrate that this application successfully guides UGVs to their destinations following human language instructions in novel environments, such as unfamiliar terrain or urban settings.
- North America > United States (0.04)
- Asia > South Korea > Daejeon > Daejeon (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report (1.00)
- Overview > Innovation (0.34)
- Transportation > Ground > Road (0.51)
- Information Technology > Robotics & Automation (0.51)
- Automobiles & Trucks (0.51)
BehAV: Behavioral Rule Guided Autonomy Using VLMs for Robot Navigation in Outdoor Scenes
Weerakoon, Kasun, Elnoor, Mohamed, Seneviratne, Gershom, Rajagopal, Vignesh, Arul, Senthil Hariharan, Liang, Jing, Jaffar, Mohamed Khalid M, Manocha, Dinesh
We present BehAV, a novel approach for autonomous robot navigation in outdoor scenes guided by human instructions and leveraging Vision Language Models (VLMs). Our method interprets human commands using a Large Language Model (LLM) and categorizes the instructions into navigation and behavioral guidelines. Navigation guidelines consist of directional commands (e.g., "move forward until") and associated landmarks (e.g., "the building with blue windows"), while behavioral guidelines encompass regulatory actions (e.g., "stay on") and their corresponding objects (e.g., "pavements"). We use VLMs for their zero-shot scene understanding capabilities to estimate landmark locations from RGB images for robot navigation. Further, we introduce a novel scene representation that utilizes VLMs to ground behavioral rules into a behavioral cost map. This cost map encodes the presence of behavioral objects within the scene and assigns costs based on their regulatory actions. The behavioral cost map is integrated with a LiDAR-based occupancy map for navigation. To navigate outdoor scenes while adhering to the instructed behaviors, we present an unconstrained Model Predictive Control (MPC)-based planner that prioritizes both reaching landmarks and following behavioral guidelines. We evaluate the performance of BehAV on a quadruped robot across diverse real-world scenarios, demonstrating a 22.49% improvement in alignment with human-teleoperated actions, as measured by Frechet distance, and achieving a 40% higher navigation success rate compared to state-of-the-art methods.
- Energy > Oil & Gas (0.54)
- Transportation (0.46)
QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds
Wang, Ye, Mei, Yuting, Zheng, Sipeng, Jin, Qin
While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-making; ii) mastering agile control of locomotion and path planning; iii) developing advanced cognition to execute long-term objectives. QuadrupedGPT processes human command and environmental contexts using a large multimodal model (LMM). Empowered by its extensive knowledge base, our agent autonomously assigns appropriate parameters for adaptive locomotion policies and guides the agent in planning a safe but efficient path towards the goal, utilizing semantic-aware terrain analysis. Moreover, QuadrupedGPT is equipped with problem-solving capabilities that enable it to decompose long-term goals into a sequence of executable subgoals through high-level reasoning. Extensive experiments across various benchmarks confirm that QuadrupedGPT can adeptly handle multiple tasks with intricate instructions, demonstrating a significant step towards the versatile quadruped agents in open-ended worlds. Our website and codes can be found at https://quadruped-hub.github.io/Quadruped-GPT/.
- Asia > China > Beijing > Beijing (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
BSL: Navigation Method Considering Blind Spots Based on ROS Navigation Stack and Blind Spots Layer for Mobile Robot
Kobayashi, Masato, Motoi, Naoki
This paper proposes a navigation method considering blind spots based on the robot operating system (ROS) navigation stack and blind spots layer (BSL) for a wheeled mobile robot. In this paper, environmental information is recognized using a laser range finder (LRF) and RGB-D cameras. Blind spots occur when corners or obstacles are present in the environment, and may lead to collisions if a human or object moves toward the robot from these blind spots. To prevent such collisions, this paper proposes a navigation method considering blind spots based on the local cost map layer of the BSL for the wheeled mobile robot. Blind spots are estimated by utilizing environmental data collected through RGB-D cameras. The navigation method that takes these blind spots into account is achieved through the implementation of the BSL and a local path planning method that employs an enhanced cost function of dynamic window approach. The effectiveness of the proposed method was further demonstrated through simulations and experiments.